Multi-camera device

ABSTRACT

This specification describes: using a first camera of a multi-camera device to obtain first video data of a first region; using a second camera of the multi-camera device to obtain second video data of a second region; generating a multi-camera video output from the first and second video data using a first video mapping to map the first video data to a first portion of the multi-camera video output and using a second video mapping to map the second video data to a second portion of the multi-camera video output; and generating an audio output from obtained audio data, the audio output comprising an audio output having a directional component within the first portion of the video output and an audio output having a directional component within the second portion of the video output, wherein generating the audio output comprises using a first audio mapping to map audio data having a directional component within the first region to the audio output having a directional component within the first portion of the video output and using a second audio mapping to map audio data having a directional component within the second region to the audio output having a directional component within the second portion of the video output.

FIELD

The present specification relates to capturing video and audio contentusing a multi-camera device, such as a suitable mobile communicationdevice.

BACKGROUND

Mobile communication devices including video cameras are known.Moreover, it is known to provide such cameras on both the front and rearof a mobile communication device. Content from the front and rear cameramay then be presented next to each other. Such a presentation of visualdata has implications for the presentation of some audio data relatingto such visual data.

SUMMARY

In a first aspect, this specification describes a method comprising:using a first camera of a multi-camera device to obtain first video dataof a first region; using a second camera of the multi-camera device toobtain second video data of a second region, the second camera beingorientated in a different direction to the first camera, such that thefirst and second regions are at least partially different; generating amulti-camera video output from the first and second video data using afirst video mapping to map the first video data to a first portion ofthe multi-camera video output and using a second video mapping to mapthe second video data to a second portion of the multi-camera videooutput; using the multi-camera device to obtain audio data, at leastsome of the audio data having a directional component; and generating anaudio output from the obtained audio data, the audio output comprisingan audio output having a directional component within the first portionof the video output and an audio output having a directional componentwithin the second portion of the video output, wherein generating theaudio output comprises using a first audio mapping to map audio datahaving a directional component within the first region to the audiooutput having a directional component within the first portion of thevideo output and using a second audio mapping to map audio data having adirectional component within the second region to the audio outputhaving a directional component within the second portion of the videooutput, wherein the first and second audio mappings correspond to thefirst and second video mappings respectively. The first and secondportions of the multi-camera output may be presented side-by-side. Thefirst and second portions of the multi-camera output may be presentedwith one data output on top of the other. The first camera may be afront camera. The second camera may be a rear camera.

The first and second video mappings may include modifying the first andsecond video data such that the first and second portions of the videooutput are narrower than the first and second video data.

Audio data having a directional component outside both the first andsecond regions may be excluded from the audio output. Alternatively,audio data having a directional component outside both the first andsecond regions may be included in the audio output as audio outputwithout a directional component.

In a further alternative, audio data having a directional componentoutside both the first and second regions may be included in the audiooutput as audio output with a directional component, wherein the audiodata having a directional component outside both the first and secondregions undergoes a third audio mapping. The audio data having adirectional component outside both the first and second regions may bestretched by the third audio mapping such that the area of the audiooutput corresponding to the area outside the first and second regions iswider than said area outside said first and second regions.Alternatively, or in addition, the first, second and third audiomappings may be such that the audio output provides a 360 degree audiooutput.

In some embodiments, using the multi-camera device to obtain the audiodata may comprise using one or more spatial microphones or an array ofmicrophones.

The method may further comprise a user indicating whether audio dataassociated with the first portion or the second portion of themulti-camera video output is to be boosted and/or attenuated. The userindication may be performed by a user contacting the first and/or thesecond portion of the multi-camera video output respectively.

In a second aspect, this specification describes an apparatus configuredto perform any method as described with reference to the first aspect.

In a third aspect, this specification describes computer readableinstructions which, when executed by computing apparatus, causes thecomputing apparatus to perform any method as described with reference tothe first aspect.

In a fourth aspect, this specification describes a computer readablemedium having computer-readable code stored thereon, the computerreadable code, when executed by at least one processor, causingperformance of: obtaining first video data of a first region using afirst camera of a multi-camera device; obtaining second video data of asecond region using a second camera of the multi-camera device, thesecond camera being orientated in a different direction to the firstcamera, such that the first and second regions are at least partiallydifferent; generating a multi-camera video output from the first andsecond video data using a first video mapping to map the first videodata to a first portion of the multi-camera video output and using asecond video mapping to map the second video data to a second portion ofthe multi-camera video output; obtaining audio data using themulti-camera device, at least some of the audio data having adirectional component; and generating an audio output from the obtainedaudio data, the audio output comprising an audio output having adirectional component within the first portion of the video output andan audio output having a directional component within the second portionof the video output, wherein generating the audio output comprises usinga first audio mapping to map audio data having a directional componentwithin the first region to the audio output having a directionalcomponent within the first portion of the video output and using asecond audio mapping to map audio data having a directional componentwithin the second region to the audio output having a directionalcomponent within the second portion of the video output, wherein thefirst and second audio mappings correspond to the first and second videomappings respectively.

In a fifth aspect, this specification describes an apparatus comprising:at least one processor; and at least one memory including computerprogram code which, when executed by the at least one processor, causesthe apparatus to: use a first camera of a multi-camera device to obtainfirst video data of a first region; use a second camera of themulti-camera device to obtain second video data of a second region, thesecond camera being orientated in a different direction to the firstcamera, such that the first and second regions are at least partiallydifferent; generate a multi-camera video output from the first andsecond video data using a first video mapping to map the first videodata to a first portion of the multi-camera video output and using asecond video mapping to map the second video data to a second portion ofthe multi-camera video output; use the multi-camera device to obtainaudio data, at least some of the audio data having a directionalcomponent; and generate an audio output from the obtained audio data,the audio output comprising an audio output having a directionalcomponent within the first portion of the video output and an audiooutput having a directional component within the second portion of thevideo output, wherein generating the audio output comprises using afirst audio mapping to map audio data having a directional componentwithin the first region to the audio output having a directionalcomponent within the first portion of the video output and using asecond audio mapping to map audio data having a directional componentwithin the second region to the audio output having a directionalcomponent within the second portion of the video output, wherein thefirst and second audio mappings correspond to the first and second videomappings respectively.

In a sixth aspect, this specification describes an apparatus comprising:means for obtaining first video data of a first region using a firstcamera of a multi-camera device; means for obtaining second video dataof a second region using a second camera of the multi-camera device, thesecond camera being orientated in a different direction to the firstcamera, such that the first and second regions are at least partiallydifferent; means for generating a multi-camera video output from thefirst and second video data using a first video mapping to map the firstvideo data to a first portion of the multi-camera video output and usinga second video mapping to map the second video data to a second portionof the multi-camera video output; means for obtaining audio data usingthe multi-camera device, at least some of the audio data having adirectional component; and means for generating an audio output from theobtained audio data, the audio output comprising an audio output havinga directional component within the first portion of the video output andan audio output having a directional component within the second portionof the video output, wherein generating the audio output comprises usinga first audio mapping to map audio data having a directional componentwithin the first region to the audio output having a directionalcomponent within the first portion of the video output and using asecond audio mapping to map audio data having a directional componentwithin the second region to the audio output having a directionalcomponent within the second portion of the video output, wherein thefirst and second audio mappings correspond to the first and second videomappings respectively. The first and second portions of the multi-cameraoutput may be presented side-by-side. The first and second portions ofthe multi-camera output may be presented with one data output on top ofthe other. The first camera may be a front camera. The second camera maybe a rear camera. The means for obtaining audio data using themulti-camera device may comprise using one or more spatial microphonesor an array of microphones.

The apparatus may comprise means for modifying the first and secondvideo data to implement the first and second video mappings such thatthe first and second portions of the video output are narrower than thefirst and second video data.

Audio data having a directional component outside both the first andsecond regions may be excluded from the audio output. Alternatively, orin addition, audio data having a directional component outside both thefirst and second regions may be included in the audio output as audiooutput without a directional component.

In a further alternative, audio data having a directional componentoutside both the first and second regions may be included in the audiooutput as audio output with a directional component, wherein the audiodata having a directional component outside both the first and secondregions undergoes a third audio mapping.

The apparatus may comprise means for stretching the audio data having adirectional component outside both the first and second regions by thethird audio mapping such that the area of the audio output correspondingto the area outside the first and second regions is wider than said areaoutside said first and second regions.

The first, second and third audio mappings may be arranged such that theaudio output provides a 360 degree audio output.

The apparatus may further comprise means for obtaining a user indicationof whether audio data associated with the first portion or the secondportion of the multi-camera video output is to be boosted and/orattenuated. The user indication may be performed by a user contactingthe first and/or the second portion of the multi-camera video outputrespectively.

The said means may comprise: at least one processor; and at least onememory including computer program code, the at least one memory and thecomputer program configured, with the at least one processor, to causethe performance of the apparatus.

In a seventh aspect, this specification describes a computer-readablemedium (such as a non-transitory computer readable medium) comprisingprogram instructions stored thereon for performing at least thefollowing: using a first camera of a multi-camera device to obtain firstvideo data of a first region; using a second camera of the multi-cameradevice to obtain second video data of a second region, the second camerabeing orientated in a different direction to the first camera, such thatthe first and second regions are at least partially different;generating a multi-camera video output from the first and second videodata using a first video mapping to map the first video data to a firstportion of the multi-camera video output and using a second videomapping to map the second video data to a second portion of themulti-camera video output; using the multi-camera device to obtain audiodata, at least some of the audio data having a directional component;and generating an audio output from the obtained audio data, the audiooutput comprising an audio output having a directional component withinthe first portion of the video output and an audio output having adirectional component within the second portion of the video output,wherein generating the audio output comprises using a first audiomapping to map audio data having a directional component within thefirst region to the audio output having a directional component withinthe first portion of the video output and using a second audio mappingto map audio data having a directional component within the secondregion to the audio output having a directional component within thesecond portion of the video output, wherein the first and second audiomappings correspond to the first and second video mappings respectively.

BRIEF DESCRIPTION OF THE DRAWINGS

Example embodiments will now be described, by way of non-limitingexamples, with reference to the following schematic drawings, in which:

FIG. 1 is a block diagram of a system in accordance with an exampleembodiment;

FIG. 2 shows an example view output by the system of FIG. 1;

FIG. 3 shows an example view output by the system of FIG. 1;

FIG. 4 is a block diagram of a system in accordance with an exampleembodiment;

FIG. 5 is a block diagram of a system in accordance with an exampleembodiment;

FIG. 6 shows data captured and output by the system of FIG. 5 inaccordance with an example embodiment;

FIG. 7 shows data output by the system of FIG. 5 in accordance with anexample embodiment;

FIG. 8 shows data output by the system of FIG. 5 in accordance with anexample embodiment;

FIG. 9 shows data output by the system of FIG. 5 in accordance with anexample embodiment;

FIG. 10 shows data as manipulated in accordance with an exampleembodiment;

FIG. 11 shows data as manipulated in accordance with an exampleembodiment;

FIG. 12 shows data as manipulated in accordance with an exampleembodiment;

FIG. 13 shows data output by the system of FIG. 5 in accordance with anexample embodiment;

FIG. 14 is a flow chart showing an algorithm in accordance with anexample embodiment;

FIG. 15 shows user interaction with a data output in accordance with anexample embodiment;

FIG. 16 is a block diagram of a system in accordance with an exampleembodiment;

FIG. 17 is an example view output by the system of FIG. 16;

FIG. 18 is an example view output by the system of FIG. 16;

FIG. 19 is a block diagram, of components of a processing system inaccordance with an example embodiment;

FIGS. 20a and 20b show tangible media, respectively a removable memoryunit and a compact disc (CD) storing computer-readable code which whenrun by a computer perform operations according to embodiments.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of a system, indicated generally by thereference numeral 10, in accordance with an example embodiment.

The system 10 comprises a user device 12, such as a mobile communicationdevice (e.g. a mobile phone). The user device 12 has a front videocamera 13 and a rear video camera 14. A first object 15 and a secondobject 16 are within a viewpoint of the front camera 13 (as indicated bydotted lines). A third object 17 and a fourth object 18 are within aviewpoint of the rear camera 14 (as indicated by dotted lines), with thefourth object behind at least partially obscured by the third object.The third object 17 may, for example, be the user of the user device 12.A fifth object 19 is to the right of the user device and is outside thefield of view of both the front camera 13 and the rear camera 14.

The user device 12 is an example of a multi-camera device. Othermulti-camera devices could be used in accordance with the principlesdescribed herein.

FIG. 2 shows an example view, indicated general by the reference numeral20, output by the user device 12 described above with reference toFIG. 1. The view 20 is a combined view that includes a first view 21that is provided by the front camera 13 and a second view 22 that isprovided by the rear camera 14. As shown in FIG. 2, the combined view 20displays the first and second views side-by-side.

The first view 21 includes a first image 23 and a second image 24. Thefirst image 23 (on the left of the view 21) is a representation of thefirst object 15. The second image 24 is a representation of the secondobject 16. In a similar way, the second view 22 includes a third image25 that is a representation of the third object 17 and a fourth image 26that is a representation of the fourth object 17. The fifth object 19 isnot displayed.

FIG. 3 shows an example view, indicated generally by the referencenumeral 30, output by the user device 12 described above with referenceto FIG. 1. The view 30 is a combined view that includes a first view 31that is provided by the front camera 13 and a second view 32 that isprovided by the rear camera 14. The combined view 30 differs from thecombined view 20 described above in that the first and second views areshown on top of one another in the combined view 30. As shown in FIG. 3,the first view 31 is displayed above the second view 32. In alternativeembodiments, the second view could be displayed above the first view.

The first view 31 includes a first image 33 and a second image 34. Thefirst image 33 (on the left of the view 31) is a representation of thefirst object 15. The second image 34 is a representation of the secondobject 16. In a similar way, the second view 32 includes a third image35 that is a representation of the third object 17 and a fourth image 36that is a representation of the fourth object 17. The fifth object 19 isnot displayed.

Thus, the views 20 and 30 are similar, differing only in the arrangementof the first and second views that make up the views 20 and 30. (Whetheror not the view 20 or 30 is used may depend on the orientation of theuser device 12.)

FIG. 4 is a block diagram of a system, indicated generally by thereference numeral 40, in accordance with an example embodiment. Thesystem 40 shows the elements of the system 10 described above (the userdevice 12 and the objects 15 to 19) in dotted lines. Also shown in thesystem 40 are the effective positions of the visual representations ofthose objects as shown in FIG. 2 (i.e. the images 23, 24, 25 and 26)shown in solid form.

As shown in FIG. 4, the visual representations 23 and 24 of the objects15 and 16 are shifted to the left. Further, the visual representations25 and 26 of the objects 17 and 18 are moved to in front of the userdevice (rather than behind).

As noted above, spatial audio techniques are known in which an array ofmicrophones is used to capture a sound scene and subjected to parametricspatial audio processing so that, during rendering, sounds are presentedso that sounds are heard as if coming from directions around the userthat match video recordings. Such techniques are known, for example, invirtual reality or augmented reality applications. Such spatial audioprocessing may involve estimating the directional portion of the soundscene and the ambient portion of the sound scene.

The directional portion of the sound scene may comprise sound with anapparent direction of arrival (DOA), and may include direct sounds suchas sounds of objects in the scene such as speakers, and also earlyreflections from the walls or the floor. The ambient of the sound scenemay comprise sounds without apparent, strong directionality, such asdiffuse reverberation. Analysis for the direction of arrival and thedirect to ambient ratio may be performed at time-frequency tilesdescribing the spatial audio content at short temporal frames and atdifferent frequencies (frequency bands). After analysing the directionalportion and ambient portion of a spatial audio scene, the spatial audioscene can be represented in a suitable format such as two audio signalsand metadata describing the direction of arrival and diffuseness foreach time-frequency tile. In playback, the direct portion and ambientportion of the spatial audio scene may be synthesized (rendered). Forexample, the direct portion of the sound scene may be rendered andspatially positioned with vector-base-amplitude-panning (VBAP) such thatit appears to emanate from the direction corresponding to the directionof arrival. The ambient portion may be rendered from all the directions,for example, all output channels, such that it appears to emanate fromeverywhere and not from any specific direction. Decorrelation filteringmay be applied to the output signals of the ambient signal portion sothat the coherence between channels is minimized and the output signalbecomes enveloping (surrounding the listener). Instead of VBAP,head-related-transfer-function (HRTF) filtering may be used if abinaural output suitable for headphone listening is desired.

If the sound scene is represented in a parametric format as above,certain transformations can be applied. A typical example is rotatingthe sound directions of arrival depending on user head rotation suchthat the sound directions of arrival originate from fixed directionswith regard to the world coordinates and do not rotate along with theuser's head. In a similar manner, the direct sounds can be repositionedto a new direction of arrival by modifying the direction of arrival datato a new desired direction of arrival.

Instead of the above parametric spatial audio representation, spatialaudio data could be represented as object-based data, where each soundobject is represented as its own audio channel with position data. Inthis case, transforming the spatial audio data involves modifying theposition data before rendering. The audio data could also comprise acombination of object-based data and channel bed data or ambisonicsdata. For example, the audio data could be in Moving Pictures ExpertGroup (MPEG)-H 3D audio format, or in any other suitable format whichfacilitates some transformations. If the spatial audio data is in aformat which does not facilitate any transformations, then the devicecould apply analysis and/or parameterization to the audio data so thatit can be converted to a format which enables transforming at least thedirections of arrival of sound sources or their portions.

Spatial audio may also be presented to a user wearing headphones andreceiving audio only. For example, with spatial audio, objects may soundlouder the closer the user is to the audio and thus an improved userexperience can be achieved with speakers only (i.e. without requiringthe inclusion of video data).

Consider spatial audio recordings made by the user device 12 andpresented in the system 40 shown in FIG. 4. If the audio is presented asa 3D sound scene, the sounds from the objects 15 to 19 will be heard asif coming from the locations shown in dotted lines in the system 40.Thus, there will be a mismatch between the visual views 23 to 25 and thecorresponding audio.

FIG. 5 is a block diagram of a system, indicated generally by thereference numeral 50, in accordance with an example embodiment. Thesystem includes the user device 12, first camera 13 and second camera 14described above. The system 50 includes a first object 51 and a secondobject 52 within the field of view of the first camera 13 and a thirdobject 53 and a fourth object 54 within the field of view of the secondcamera 14. The field of view of the first camera may be referred to as afirst region, such that the first camera may be used to obtain data(e.g. video data) from the first region. Similarly, the field of view ofthe second camera may be referred to as a second region, such that thesecond camera may be used to obtain data (e.g. video data) from thesecond region.

As well as these visible objects, the system 50 also includes a fifthobject 55, a sixth object 56, a seventh object 57 and an eighth object58 that are not within the field of view of either camera of the userdevice. Of those, the fifth and sixth objects are to the left of theuser device (as shown in FIG. 5) and the seventh and eighth objects areto the right of the user device (as shown in FIG. 5).

FIGS. 6 to 9 show data captured and output by the system 50 inaccordance with various embodiments of the present invention.

FIG. 6 shows data, indicated generally by the reference numeral 60,captured and output by the system 50 in accordance with an exampleembodiment. The data 60 includes first, second, third and fourth visualrepresentations 61 to 64 (and accompanying spatial audio data) of thefirst, second, third and fourth objects 51 to 54 respectively. Thevisual representations 61 and 62 are shifted to the left andcompressed/squeezed closer to cover a narrower area (as can be seen bycomparing the positions of the representations 61 and 62 with thepositions of the objects 51 and 52 shown in FIG. 5). Further, the visualrepresentations 63 and 64 are moved to be next to the representations 61and 62 (which can be viewed as being rotated by about 180 degrees). Therepresentations 63 and 64 are also compressed/squeezed (as can be seenby comparing the positions of the representations 63 and 64 with thepositions of the objects 53 and 54). Importantly, the spatial audio forthe objects 51 to 54 is also required to be moved (andcompressed/squeezed) by the same amounts so that the audio for theobjects 51 to 54 appears to come from the same locations as the visualrepresentations 61 to 64 respectively. Thus, the audio data undergoes atransformation corresponding to the transformation of the video data. Inthis way, the confusion described above as a result of a mismatchbetween audio and visual representations can be avoided. (The varioustransformations of the visual and audio representations may be referredto as video and audio mappings respectively.)

In FIGS. 7 to 9, the video and audio data for the objects 51 to 54 (therepresentations 61 to 64) are identical to the representations in FIG.6. FIGS. 7 to 9 also show different options for representing the audiodata from the objects 55 to 58 that are not within the fields of view ofthe cameras 13 and 14, but for which spatial audio data may have beenobtained.

FIG. 7 shows data, indicated generally by the reference numeral 70,output by the system 50 in accordance with an example embodiment. Asnoted above, the data 70 includes the video and audio representations 61to 64 of the objects 51 to 54 respectively. FIG. 7 also shows first,second, third and fourth effective audio positions 71 to 74 for thefifth to eighth objects 55 to 58 respectively. (The audiorepresentations 71 to 73 are indicated by squares to distinguish themfrom the video and audio representations 61 to 64.) As shown in FIG. 7,the representations 71 to 74 match the actual positions of the objects55 to 58 respectively. Thus, no spatial data is represented as comingfrom the viewpoint of the rear camera 13.

FIG. 8 shows data, indicated generally by the reference numeral 80,output by the system 50 in accordance with an example embodiment. Asnoted above, the data 80 includes the video and audio representations 61to 64 of the objects 51 to 54 respectively. FIG. 8 does not includeaudio data for the objects 51 to 54. Thus, in the example output 80,audio data from the objects 51 to 54 is either omitted entirely, or isnot represented as directional spatial audio data (such audio data may,for example, be represented as ambient data, rather than directionalspatial audio data).

FIG. 9 shows data, indicated generally by the reference numeral 90,output by the system of 50 in accordance with an example embodiment. Asnoted above, the data 90 includes the video and audio representations 61to 64 of the objects 51 to 54 respectively. FIG. 9 also shows first,second, third and fourth effective audio positions 91 to 94 for thefifth to eighth objects 55 to 58 respectively. As shown in FIG. 9, therepresentations 91 to 94 are moved compared with actual positions of theobjects 55 to 58 so that spatial audio data appears to come from all 360degrees around the user. The moving of the representations 91 to 94effectively stretches/fans-out the zones outside the fields of view ofthe cameras 13 and 14 and matches the compression of the representations61 to 64 (i.e. the zone within the fields of view of the cameras). Thus,the spatial data 90 differs from the spatial data 70 described above inthat audio data includes audio coming from the viewpoint of the rearcamera 13.

FIG. 10 shows data, indicated generally by the reference numeral 100, asmanipulated in accordance with an example embodiment. Specifically, FIG.10 shows how the captured video and audio data of the system 50described above with reference to FIG. 5 is adjusted to provide therepresentation 90 described above with reference to FIG. 9.

FIG. 10 includes the first to eighth objects 51 to 58 described abovewith reference to FIG. 5. The first and second objects 51 and 52 arewithin the field of view of the first camera 13 of the user device 12.Video and audio data for the first and second objects are rotatedanti-clockwise as indicated by the arrow 101 and are alsocompressed/squeezed, so that they appear in the positions indicated bythe video and audio representations 61 and 62 shown in FIG. 9. (Asdescribed above, the zone in which the representations 61 and 62 appearis smaller than the zone in which the objects 51 and 52 are located.)Similarly, the third and fourth objects 53 and 54 are within the fieldof view of the second camera 14 of the user device. Video and audio datafor the third and fourth objects are rotated anti-clockwise as indicatedby the arrow 102 (and optionally reversed) and are also compressed, sothat they appear in the positions indicated by the video and audiorepresentations 63 and 64 shown in FIG. 9. Again, the zone in which therepresentations 63 and 64 appear is smaller than the zone in which theobjects 53 and 54 are located.

The mapping indicated by the arrow 101 may be referred to as a firstvideo and audio mapping. Similarly, the mapping indicated by the arrow102 may be referred to as a second video and audio mapping.

The fifth and sixth objects 55 and 56 are outside the fields of view ofthe first and second cameras and so only audio data is captured forthose objects. The spatial audio data for the fifth and sixth objects isrotated anti-clockwise as indicated by the arrow 103 and is also moved(or stretched/fanned-out), so that they appear in the positionsindicated by the audio representations 91 and 92 shown in FIG. 9. (Asdescribed above, the zone in which the representations 91 and 92 appearis larger than the zone in which the objects 55 and 56 are located.)Similarly, the seventh and eighth objects 57 and 58 are outside thefields of view of the first and second cameras and so only audio data iscaptured for those objects. The spatial audio data for the seventh andeighth objects is rotated clockwise as indicated by the arrow 104 and isalso moved (or stretched/fanned-out), so that they appear in thepositions indicated by the audio representations 93 and 94 shown in FIG.9. Again, the zone in which the representations 93 and 94 appear islarger than the zone in which the objects 57 and 58 are located.

The compressions as a result of the rotations 101 and 102 match theexpansions as a result of the rotations 103 and 104, such that the audiooutput data provides 360 degree audio output. Thus, audio data canpotentially be heard all around the user, which may give an adequateapparent 3-dimensional representation of the audio data in somecircumstances. The mappings indicated by the arrow 103 and 104 maycollectively be referred to as a third audio mapping.

As described above, FIGS. 6 to 10 describe arrangements by which videoand audio data from the system 50 described above with reference to FIG.5 can be provided in a side-by-side arrangement, such as the display 20described above with reference to FIG. 2. FIGS. 11 to 13 describearrangement by which video and data from the system 50 can be providedwith one display on top of the other.

FIG. 11 shows data, indicated generally by the reference numeral 110, asmanipulated in accordance with an example embodiment. The data 110 isbased on the data of the system 50 described above. The data 110includes first, second, third and fourth visual representations 111 to114 (and accompanying spatial audio data) of the first, second, thirdand fourth objects 51 to 54 respectively. The data 110 also includesspatial audio representations 115 to 118 of the spatial audio data ofthe fifth to eighth objects 55 to 58 respectively that are outside thefields of view of the cameras 13 and 14.

The representations 111, 112, 115 and 117 are in the same positions asrecorded (i.e. the same positions as the objects 51, 52, 55 and 57respectively). The representations 113, 114, 116 and 118 (both visualand spatial audio data) are flipped so that they are a mirror image ofthe positions of the objects 53, 54, 56 and 58 (mirrored around a line119 extending from the camera 14 downwards).

FIG. 12 shows data, indicated generally by the reference numeral 120, asmanipulated in accordance with an example embodiment. The data 120differs from the data 110 in that the representations 113, 114, 116 and118 shown in FIG. 12 (data relating to the third, fourth, sixth andeighth objects 53, 54, 56 and 58, i.e. data below a line 121 in FIG.12), is rotated by 180 degrees (and then shifted down so that it stillappears below the line 121 shown in FIG. 12).

FIG. 13 shows a display, indicated generally by the reference numeral130, by which video and audio data from the system 50 can be providedwith one display on top of another as with the display 30 describedabove with reference to FIG. 3. The display 130 is based on the data 120described above.

The display 130 includes a first region 131 and a second region 132 thatcorrespond to the first and second views 31 and 32 described above withreference to FIG. 3. The first region 131 includes first and secondvisual representations (and accompanying spatial audio data) 141 and 142of the first and second objects 51 and 52 that are within the firstregion 131. The second region 132 includes and third and fourth visualrepresentations (and accompanying spatial audio data) 143 and 144 of thethird and fourth objects 53 and 54 that are visible with the secondregion 132. The first to fourth visual representations 141 to 144 arebased on the representations 111, 112, 113 and 114 respectivelydescribed above with reference to FIG. 12.

The data 130 also includes fifth 145 and sixth 146 audio representationscorresponding to the spatial audio data from the fifth and sixth objects55 and 56. The audio from those objects is therefore presented in thedata 130 as coming from the left of the displays 131 and 132. Similarly,the data includes seventh 147 and eighth 148 audio representationscorresponding to the spatial audio data from the seventh and eighthobjects 57 and 58. The audio from those objects is therefore presentedin the data 120 as coming from the right of the displays 131 and 132.The fifth to eighth audio representations 145 to 148 are based on therepresentations 115, 116, 117 and 118 respectively described above withreference to FIG. 12.

FIG. 14 is a flow chart showing an algorithm, indicated generally by thereference numeral 150, in accordance with an example embodiment.

The algorithm 150 starts at operation 152, where data is obtained. Thedata obtained in operation 152 includes video data (e.g. obtained fromthe video cameras 13 and 14 described above) and audio data (e.g. thespatial and ambient audio data described above).

The data obtained in operation 152 is transformed in some way inoperation 154 of the algorithm 150. The transformed data is then outputin output data operation 156.

Examples of the data transformation operation 154 include thetransformations described above with reference to FIGS. 6 to 13.

FIG. 15 shows user interaction, indicated generally by the referencenumeral 160, in accordance with an example embodiment. The userinteraction 160 shows the first view 21 and the second view 22 of theexemplary view 20 described above with reference to FIG. 2.

The user interaction 160 takes the form of a user's finger (indicated bythe reference numeral 166) pressing on the second view 22. By holding afinger on the second output view 22, the user can attenuate the audiocontent from the indicated direction (i.e. the rear direction in thecase shown in FIG. 15). Thus, all sounds coming from behind the userdevice are attenuated.

Of course, many variants to the user interaction 160 are possible. Theuser could, of course, indicate that the first view (the front view)should be attenuated (by pressing on the first view 21). Alternatively,the user indication could indicate that the indicated region should beboosted (rather than attenuated). Further, it is not necessary for theinteraction to be indicated by a user's finger contacting the display.For example, a stylus could be used. Other selection options are alsopossible, for example voice commands and keyboard or mouse instructionscould be used.

The user interaction 160 has been described with reference to theside-by-side display 20. The same principles could, of course, beapplied to the display format 30 in which one display is presented ontop of the other.

By way of examples, FIGS. 16 to 18 show an example system in which someof the principles described herein are applicable.

FIG. 16 is a block diagram, indicated general by the reference numeral400, of a system in accordance with the some of the principles describedherein. The system comprises a first object 401, a second object 402, athird object 403, a fourth object 404 and a fifth object 405. The firstand second objects are within the field of view of a front camera of auser device 406. The third object 403 (which may be an operator of theuser device 406) and the fourth object 404 are within the field of viewof a rear camera of the user device. The fifth object 405 is outside thefields of view of the cameras.

FIG. 17 is an example view 410 output by the system 400 described above.The view 410 includes a first image 411 and a second image 412, whichimages are representations of the first object 401 and second object 402as taken by the front camera of the user device 406 respectively. Thethird to fifth objects are not visible in the view 410 since they areoutside the field of view of the front camera of the user device 406.

Assume that user device records spatial audio, including spatial audiodata for the first to fifth objects 401 to 405. Such spatial audio canbe recorded using an array of microphones. The spatial audio can bepresented together with the view 410 so that the sounds in a recordedscene can be heard as if coming from the direction from which they wererecorded.

FIG. 18 is an example view 420 output by the system 400 described above.The view 420 includes a first view 430 and a second view 440 presentedside-by-side. The first view 430 includes a first image 421 and a secondimage 422, which images are representations of the first object 401 andsecond object 402 as taken by the front camera of the user device 406respectively. Similarly, the second view 440 includes a third image 423and a fourth image 424, which images are representations of the thirdobject 403 and the fourth object 404 as taken by the rear camera of theuser device 406 respectively.

As described above with reference to FIG. 17, when recording a videowith only one camera and spatial audio, the captured video and soundscene are aligned. Thus, if there is a sound producing object in thefront-centre of the video, it will be heard as if coming from the samedirection during rendering. This is not the case with the view 420.Audio rendered will not match the rendered video. For example, videodata from behind the user device 406 is rendered in the front right ofthe view 420. Audio from, for example, the third and fourth objects 403and 404 in particular will therefore not match the visual renderings 423and 424. This can cause confusion to users of the system 400 and maylessen the effect of immersion intended to be provided by the use ofspatial audio.

For completeness, FIG. 19 is an example schematic diagram of componentsof one or more of the modules described previously, in accordance withan example embodiment, which hereafter are referred to generically asprocessing systems 300. A processing system 300 may comprise a processor302, a memory 304 closely coupled to the processor and comprised of aRAM 314 and ROM 312, and, optionally, user input 310 and a display 318.The processing system 300 may comprise one or more network interfaces308 for connection to a network, e.g. a modem which may be wired orwireless.

The processor 302 is connected to each of the other components in orderto control operation thereof.

The memory 304 may comprise a non-volatile memory, such as a hard diskdrive (HDD) or a solid state drive (SSD). The ROM 312 of the memory 304stores, amongst other things, an operating system 315 and may storesoftware applications 316. The RAM 314 of the memory 304 is used by theprocessor 302 for the temporary storage of data. The operating system315 may contain code which, when executed by the processor implementsaspects of the algorithm 150.

The processor 302 may take any suitable form. For instance, it may be amicrocontroller, plural microcontrollers, a processor, or pluralprocessors.

The processing system 300 may be a standalone computer, a server, aconsole, or a network thereof.

In some embodiments, the processing system 300 may also be associatedwith external software applications. These may be applications stored ona remote server device and may run partly or exclusively on the remoteserver device. These applications may be termed cloud-hostedapplications. The processing system 300 may be in communication with theremote server device in order to utilize the software application storedthere. The communication may be processed through the network interface308.

FIGS. 20a and 20b show tangible media, respectively a removable memoryunit 365 and a compact disc (CD) 368, storing computer-readable codewhich when run by a computer may perform methods according toembodiments described above. The removable memory unit 365 may be amemory stick, e.g. a USB memory stick, having internal memory 366storing the computer-readable code. The memory 366 may be accessed by acomputer system via a connector 367. The CD 368 may be a CD-ROM or a DVDor similar. Other forms of tangible storage media may be used.

Embodiments of the present invention may be implemented in software,hardware, application logic or a combination of software, hardware andapplication logic. The software, application logic and/or hardware mayreside on memory, or any computer media. In an example embodiment, theapplication logic, software or an instruction set is maintained on anyone of various conventional computer-readable media. In the context ofthis document, a “memory” or “computer-readable medium” may be anynon-transitory media or means that can contain, store, communicate,propagate or transport the instructions for use by or in connection withan instruction execution system, apparatus, or device, such as acomputer.

Reference to, where relevant, “computer-readable storage medium”,“computer program product”, “tangibly embodied computer program” etc.,or a “processor” or “processing circuitry” etc. should be understood toencompass not only computers having differing architectures such assingle/multi-processor architectures and sequencers/parallelarchitectures, but also specialised circuits such as field programmablegate arrays FPGA, application specify circuits ASIC, signal processingdevices and other devices. References to computer program, instructions,code etc. should be understood to express software for a programmableprocessor firmware such as the programmable content of a hardware deviceas instructions for a processor or configured or configuration settingsfor a fixed function device, gate array, programmable logic device, etc.

As used in this application, the term “circuitry” refers to all of thefollowing: (a) hardware-only circuit implementations (such asimplementations in only analogue and/or digital circuitry) and (b) tocombinations of circuits and software (and/or firmware), such as (asapplicable): (i) to a combination of processor(s) or (ii) to portions ofprocessor(s)/software (including digital signal processor(s)), software,and memory(ies) that work together to cause an apparatus, such as aserver, to perform various functions) and (c) to circuits, such as amicroprocessor(s) or a portion of a microprocessor(s), that requiresoftware or firmware for operation, even if the software or firmware isnot physically present.

If desired, the different functions discussed herein may be performed ina different order and/or concurrently with each other. Furthermore, ifdesired, one or more of the above-described functions may be optional ormay be combined. Similarly, it will also be appreciated that the flowdiagram of FIG. 14 is an example only and that various operationsdepicted therein may be omitted, reordered and/or combined.

It will be appreciated that the above described example embodiments arepurely illustrative and are not limiting on the scope of the invention.Other variations and modifications will be apparent to persons skilledin the art upon reading the present specification.

Moreover, the disclosure of the present application should be understoodto include any novel features or any novel combination of featureseither explicitly or implicitly disclosed herein or any generalizationthereof and during the prosecution of the present application or of anyapplication derived therefrom, new claims may be formulated to cover anysuch features and/or combination of such features.

Although various aspects of the invention are set out in the independentclaims, other aspects of the invention comprise other combinations offeatures from the described embodiments and/or the dependent claims withthe features of the independent claims, and not solely the combinationsexplicitly set out in the claims.

It is also noted herein that while the above describes various examples,these descriptions should not be viewed in a limiting sense. Rather,there are several variations and modifications which may be made withoutdeparting from the scope of the present invention as defined in theappended claims.

The invention claimed is:
 1. An apparatus, comprising: at least oneprocessor; and at least one memory including computer program code; theat least one memory and the computer program code configured to, withthe at least one processor, cause the apparatus at least to perform:obtaining first video data of a first region using a first camera of amulti-camera device; obtaining second video data of a second regionusing a second camera of the multi-camera device, the second camerabeing orientated in a different direction to the first camera, such thatthe first and second regions are at least partially different;generating a multi-camera video output from the first and second videodata using a first video mapping to map the first video data to a firstportion of the multi-camera video output and using a second videomapping to map the second video data to a second portion of themulti-camera video output; modifying the first and second video data toimplement the first and second video mappings such that the first andsecond portions of the video output are narrower than the first andsecond video data; obtaining audio data using the multi-camera device,at least some of the audio data having a directional component; andgenerating an audio output from the obtained audio data, the audiooutput comprising an audio output having a directional component withinthe first portion of the video output and an audio output having adirectional component within the second portion of the video output,wherein generating the audio output comprises using a first audiomapping to map audio data having a directional component within thefirst region to the audio output having a directional component withinthe first portion of the video output and using a second audio mappingto map audio data having a directional component within the secondregion to the audio output having a directional component within thesecond portion of the video output, wherein the first and second audiomappings correspond to the first and second video mappings respectively.2. The apparatus as claimed in claim 1, wherein the at least one memoryand the computer program code are configured such that audio data havinga directional component outside both the first and second regions isexcluded from the audio output.
 3. The apparatus as claimed in claim 1,wherein the at least one memory and the computer program code areconfigured such that audio data having a directional component outsideboth the first and second regions is included in the audio output asaudio output without a directional component.
 4. The apparatus asclaimed in claim 1, wherein the at least one memory and the computerprogram code are configured such that audio data having a directionalcomponent outside both the first and second regions is included in theaudio output as audio output with a directional component, wherein theaudio data having a directional component outside both the first andsecond regions undergoes a third audio mapping.
 5. The apparatus asclaimed in claim 4, wherein the at least one memory and the computerprogram code are configured to, with the at least one processor, causethe apparatus to further perform stretching the audio data having adirectional component outside both the first and second regions by thethird audio mapping such that the area of the audio output correspondingto the area outside the first and second regions is wider than said areaoutside said first and second regions.
 6. The apparatus as claimed inclaim 4, wherein the at least one memory and the computer program codeare configured such that the first, second and third audio mappings aresuch that the audio output provides a 360 degree audio output.
 7. Theapparatus as claimed in claim 1, wherein the at least one memory and thecomputer program code are configured such that the first and secondportions of the multi-camera output are presented side-by-side.
 8. Theapparatus as claimed in claim 1, wherein the at least one memory and thecomputer program code are configured such that the first and secondportions of the multi-camera output are presented with one data outputon top of the other.
 9. The apparatus as claimed in claim 1, wherein theat least one memory and the computer program code are configured suchthat the obtaining audio data using the multi-camera device comprisesusing one or more spatial microphones or an array of microphones. 10.The apparatus as claimed in claim 1, wherein the at least one memory andthe computer program code are configured to, with the at least oneprocessor, cause the apparatus to further perform obtaining a userindication of whether audio data associated with the first portion orthe second portion of the multi-camera video output is to be boosted orattenuated.
 11. The apparatus as claimed in claim 10, wherein the atleast one memory and the computer program code are configured such thatthe user indication is performed by a user contacting the first or thesecond portion of the multi-camera video output respectively.
 12. Theapparatus as claimed in claim 1, wherein data obtained from the firstcamera is obtained from a front camera, or the data obtained from thesecond camera is obtained from a rear camera.
 13. A method, comprising:using a first camera of a multi-camera device to obtain first video dataof a first region; using a second camera of the multi-camera device toobtain second video data of a second region, the second camera beingorientated in a different direction to the first camera, such that thefirst and second regions are at least partially different; generating amulti-camera video output from the first and second video data using afirst video mapping to map the first video data to a first portion ofthe multi-camera video output and using a second video mapping to mapthe second video data to a second portion of the multi-camera videooutput; modifying the first and second video data such that the firstand second portions of the video output are narrower than the first andsecond video data; using the multi-camera device to obtain audio data,at least some of the audio data having a directional component; andgenerating an audio output from the obtained audio data, the audiooutput comprising an audio output having a directional component withinthe first portion of the video output and an audio output having adirectional component within the second portion of the video output,wherein generating the audio output comprises using a first audiomapping to map audio data having a directional component within thefirst region to the audio output having a directional component withinthe first portion of the video output and using a second audio mappingto map audio data having a directional component within the secondregion to the audio output having a directional component within thesecond portion of the video output, wherein the first and second audiomappings correspond to the first and second video mappings respectively.14. The method as claimed in claim 13, wherein audio data having adirectional component outside both the first and second regions is:excluded from the audio output; included in the audio output as audiooutput without a directional component; or included in the audio outputas audio output with a directional component, wherein the audio datahaving a directional component outside both the first and second regionsundergoes a third audio mapping.
 15. The method as claimed in claim 13,wherein the first and second portions of the multi-camera output arepresented side-by-side or with one data output on top of the other. 16.The method as claimed in claim 13, further comprising a user indicatingwhether audio data associated with the first portion or the secondportion of the multi-camera video output is to be boosted or attenuated.17. A non-transitory computer-readable medium having computer-readablecode stored thereon, the computer readable code, when executed by atleast one processor, causing performance of: obtaining first video dataof a first region using a first camera of a multi-camera device;obtaining second video data of a second region using a second camera ofthe multi-camera device, the second camera being orientated in adifferent direction to the first camera, such that the first and secondregions are at least partially different; generating a multi-cameravideo output from the first and second video data using a first videomapping to map the first video data to a first portion of themulti-camera video output and using a second video mapping to map thesecond video data to a second portion of the multi-camera video output;modifying the first and second video data such that the first and secondportions of the video output are narrower than the first and secondvideo data; obtaining audio data using the multi-camera device, at leastsome of the audio data having a directional component; and generating anaudio output from the obtained audio data, the audio output comprisingan audio output having a directional component within the first portionof the video output and an audio output having a directional componentwithin the second portion of the video output, wherein generating theaudio output comprises using a first audio mapping to map audio datahaving a directional component within the first region to the audiooutput having a directional component within the first portion of thevideo output and using a second audio mapping to map audio data having adirectional component within the second region to the audio outputhaving a directional component within the second portion of the videooutput, wherein the first and second audio mappings correspond to thefirst and second video mappings respectively.